-
Notifications
You must be signed in to change notification settings - Fork 146
Add support for native preemption retries #4342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for native preemption retries #4342
Conversation
internal/common/preemption/utils.go
Outdated
|
|
||
| // AreRetriesEnabled determines whether preemption retries are enabled at the job level. Also returns whether the | ||
| // annotation was set. | ||
| func AreRetriesEnabled(annotations map[string]string) (bool, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name the return params, so it is easier to see what is what
7591f89 to
8ba4635
Compare
|
Hi, thanks for this- it looks great. One thing I think we should consider/discuss is whether we could add the preemption retry fields as first-class proto fields rather than relying on annotations. Reasoning here is that anottions are very easy to add at first as they require no interface changes, but can soone get quite hard to work with as evwerything is just a map[string]string. Personally I'd be in favour of adding these fields to One comment I |
99a74c5 to
3f819bc
Compare
I looked into this but it seems a bit difficult to keep it is an annotation and not a first class citizen. I'm not able to create the scheduling info from an |
|
@jparraga-stackav Thanks for the work on this.
We could incorporate a simple check on |
Signed-off-by: Jason Parraga <[email protected]>
5c813fa to
287daaa
Compare
I will likely look into this as a follow up improvement. |
75306bd to
dbd0cef
Compare
Signed-off-by: Jason Parraga <[email protected]>
dbd0cef to
7cf9c85
Compare
Signed-off-by: Jason Parraga <[email protected]>
| @@ -1 +1 @@ | |||
| ALTER TABLE runs ADD COLUMN IF NOT EXISTS run_index bigint NOT NULL DEFAULT 0; | |||
| ALTER TABLE runs ADD COLUMN IF NOT EXISTS run_index bigint; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During testing we found that the old scheduler ingester was trying to write rows with a null value which blocked ingestion from occurring. In order to make this smoother we've made this null-able and then handle null-able values in the app layer.
What type of PR is this?
New feature
What this PR does / why we need it:
This pull request add support for native Armada Preemption Retry Handling. Retry handling can be configured at the platform level as a default in the scheduling config as well as with two annotations:
The scheduling algorithm has been modified to not fail jobs that are preempted if they are eligible for a retry. If the job is eligible to be retried it will be marked to be requeued.
Unit tests are included. We've also tested this end to end in our development environment with jobs/gangs and combinations of successful retries as well as exhausting retries.
Which issue(s) this PR fixes:
Fixes: #4340
Special notes for your reviewer: